Ggplot2

Quantitative Methodology (UPF)

Jordi Mas Elias

https://www.jordimas.cat/

Summary

  • Facet
  • Coordinates
  • Scale
  • Labels and themes

Warm up

R learning curve

Layers

Basic layers

Almost always, a ggplot consists of three layers1:

    1. Dataframe
    1. Aesthetics
    1. Geometry
df |> 
  ggplot(aes(aestethics)) +
  geometry() +
  facet() +
  scale() +
  theme() +
  ...

Optional layers

Optionally, we add more layers, such as:

    1. Facet
    1. Coordinates
    1. Scale
    1. Theme
    1. Etc

Facet

Facet

Col: facet_wrap(facets=vars(v), ncol=1)

Code
elecc19 |> 
  filter(nombre_de_comunidad == "Cataluña") |> 
  mutate(pp_per = pp / total_votantes * 100) |> 
  ggplot(aes(x = pp_per)) +
  geom_histogram() +
  facet_wrap(facets = vars(nombre_de_provincia), ncol = 1)

Facet

Row: facet_wrap(facets=vars(v), nrow=1)

Code
elecc19 |> 
  filter(nombre_de_comunidad == "Cataluña") |> 
  mutate(erc_per = erc_sobiranistes / total_votantes * 100,
         psoe_per = psoe / total_votantes * 100,
         hab = if_else(poblacion > 15000, "Ciutat", "Poble")) |> 
  ggplot(aes(x = erc_per, y = psoe_per, col = hab)) +
  geom_point() +
  facet_wrap(facets = vars(nombre_de_provincia), 
             nrow = 1, scales = "free")

Facet

None: facet_wrap(facets=vars(v)) (CatSalut)

Code
library(lubridate)
covid <- read_csv("data/Dades_di_ries_de_COVID-19_per_comarca.csv")
covid |> 
  filter(NOM != "Sense especificar") |> 
  mutate(DATA = as.Date(DATA, format = "%d/%m/%Y"),
         MONTH = month(DATA),
         YEAR = year(DATA)) |> 
  group_by(NOM, MONTH, YEAR) |> 
  summarize(casos = sum(CASOS_CONFIRMAT)) |> 
  mutate(DATE = as.Date(paste0("01/", MONTH, "/", YEAR), format = "%d/%m/%Y")) |> 
  ggplot(aes(x = DATE, y = casos)) +
  geom_line() +
  facet_wrap(facets = vars(NOM), scales = "free")

Facet

Rows and cols in a grid: facet_grid()

Code
accidents |>
  ggplot(aes(x = edat)) +
  geom_histogram() +
  facet_grid(rows = vars(descripcio_dia_setmana),
             cols = vars(descripcio_torn))

Coordinates

Coordinates

Flip axis: coord_flip()

Code
elecc19 |>
  filter(nombre_de_comunidad == "Andalucía") |> 
  mutate(cs_per = cs / poblacion * 100) |> 
  ggplot(aes(x = fct_reorder(nombre_de_provincia, cs_per), y = cs_per)) +
  geom_violin()

Code
elecc19 |>
  filter(nombre_de_comunidad == "Andalucía") |> 
  mutate(cs_per = cs / poblacion * 100) |> 
  ggplot(aes(x = fct_reorder(nombre_de_provincia, cs_per), y = cs_per)) +
  geom_violin() +
  coord_flip()

Coordinates

Change limits: coord_cartesian(ylim = c(30,40))

Code
rendacs |> 
  ggplot(aes(x = import_euros, y = index_gini, col = nom_districte)) +
  geom_point()

Code
rendacs |> 
  ggplot(aes(x = import_euros, y = index_gini, col = nom_districte)) +
  geom_point() +
  coord_cartesian(ylim = c(30, 40))

Easier: ylim(30,40) / xlim(30000,70000)

Scale

X and Y scales

Scale function:

  • scale_x_ or scale_y_
  • discrete() or continuous() (and others).

Arguments of the function:

  • breaks: Position of the breaks.
  • labels: Label of the breaks.
  • name: Title of the axis.
  • limits: Limits of the scale.

X and Y scales

Num: scale_x_continuous()

Code
rendacs |> 
  ggplot(aes(x = import_euros, y = index_gini, col = nom_districte)) +
  geom_point() +
    scale_x_continuous(
    breaks = c(20000, 50000, 80000), 
    labels = c("20k", "50k", "80k")
  )

X and Y scales

Cat: scale_x_discrete()

Code
elecc19 |> 
  filter(nombre_de_comunidad == "Cataluña") |> 
  ggplot(aes(x = nombre_de_provincia, y = numero_de_mesas)) +
  geom_col() +
  scale_x_discrete(labels = c("BCN", "GIR", "LLEI", "TGN"),
                   name = "Província")

X and Y scales

Percentages: Numeric variable from 0 to 1.

Code
elecc19 |> 
  filter(nombre_de_comunidad == "Cataluña") |> 
  mutate(erc_per = erc_sobiranistes / total_votantes,
         psoe_per = psoe / total_votantes,
         hab = if_else(poblacion > 15000, "Ciutat", "Poble")) |> 
  ggplot(aes(x = erc_per, y = psoe_per, col = hab)) +
  geom_point() +
  scale_y_continuous(labels = scales::label_percent())

Color and fill scales

Scale function: Brewer.

  • scale_color_brewer()
  • scale_fill_brewer()

Arguments of the function (see help):

  • type: "seq", "div" or "qual".
  • palette: "Greens", "Set1", "Spectral" (1, 2…).
  • direction: 1 or -1.

Color and fill scales

Scale function: Brewer (see web).

Code
elecc19 |> 
  filter(nombre_de_comunidad == "Cataluña") |> 
  ggplot(aes(x = nombre_de_provincia, y = numero_de_mesas,
             fill = nombre_de_provincia)) +
  geom_col() +
  scale_fill_brewer(palette = "Accent")

Color and fill scales

Scale function: Gradient1.

  • scale_color_gradient()
  • scale_fill_gradient()

Arguments of the function:

  • low: color of the lowest value.
  • high: color of the highest value.

Color and fill scales

Scale function: Gradient.

Code
elecc19 |> 
  filter(nombre_de_comunidad == "Cataluña") |> 
  mutate(erc_per = erc_sobiranistes / total_votantes,
         psoe_per = psoe / total_votantes,
         pp_per = pp / total_votantes) |> 
  ggplot(aes(x = erc_per, y = psoe_per, col = pp_per)) +
  geom_point() +
  scale_color_gradient(low = "white", high = "darkblue")

Color and fill scales

Scale function: Manual.

  • scale_color_manual()
  • scale_fill_manual()

Arguments of the function:

  • values: color of each category.
  • labels: name of each category.

Color and fill scales

Scale function: Manual.

Code
elecc19 |> 
  filter(nombre_de_comunidad == "Cataluña") |> 
  mutate(erc_per = erc_sobiranistes / total_votantes,
         psoe_per = psoe / total_votantes,
         hab = if_else(poblacion > 15000, "Ciutat", "Poble")) |> 
  ggplot(aes(x = erc_per, y = psoe_per, col = hab)) +
  geom_point() +
  scale_color_manual(values = c("orange", "darkgreen"),
                  labels = c("Ciutat", "Poble"))

Color and fill scales

Useful websites for colors:

Other scales

  • scale_size()
  • scale_shape()
  • scale_alpha()
  • scale_linewidth()

Labels and themes

Labels

labs()

  • title, subtitle, caption
  • x, y, col, fill

Ggplot themes

theme_minimal(), theme_light()

Code
rendacs |> 
  ggplot(aes(x = import_euros, y = index_gini, col = nom_districte)) +
  geom_point() +
  theme_classic()

Other theme packages

WSJ, The Economist, Excel… ggthemes.

Code
library(ggthemes)
rendacs |> 
  ggplot(aes(x = import_euros, y = index_gini, col = nom_districte)) +
  geom_point() +
  scale_color_wsj() +
  theme_wsj()

Modifying themes

Very time consuming, at the beginning.

See ggplot2 info.

Full-equipped example

Example

Code
options(scipen=999) #removes scientific notation
elecc19 |> 
  filter(nombre_de_comunidad == "Cataluña") |> 
  transmute(poblacion, 
            ERC = erc_sobiranistes / total_votantes,
            PSC = psoe / total_votantes) |> 
  pivot_longer(ERC:PSC, names_to = "partit", values_to = "perc") |> 
  ggplot(aes(x = log10(poblacion), y = perc, col = partit)) +
  geom_point(size = 2, alpha = 0.6, show.legend = F) +
  facet_wrap(facets = vars(partit), 
             nrow = 1) +
  scale_color_manual(values = c("gold2", "firebrick2")) +
  scale_y_continuous(labels = scales::label_percent(), name = "Percentatge de vot") +
  scale_x_continuous(name = "Població", breaks = c(2, 3, 4, 5, 6),
                     labels = c(100, 1000, 10000, 100000, 1000000)) +
  labs(title = "Vot a ERC i PSC a Catalunya",
       subtitle = "Dades de vot per municipi a les eleccions generals de 2019",
       caption = "Font: Ministeri de l'Interior") +
  theme(text = element_text(size = 15))

Important

Use Cheat Sheet & Manuals: